Conformational free energies of methyl-a-L-iduronic and methyl-/?-D-glucuronic acids 

in water. 
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We present a simulation protocol that allows for efficient sampling of the degrees of freedom of a 
solute in explicit solvent. The protocol involves using a non-equilibrium umbrella sampling method, 
in this case the recently developed adaptively biased molecular dynamics method, to compute an 
approximate free energy for the slow modes of the solute in explicit solvent. This approximate 
free energy is then used to set up a Hamiltonian replica exchange scheme that samples both from 
biased and unbiased distributions. The final accurate free energy is recovered via the WHAM technique 
applied to all the replicas, and equilibrium properties of the solute are computed from the unbiased 
trajectory. We illustrate the approach by applying it to the study of the puckering landscapes of the 
methyl glycosides of Q-L-iduronic acid and its C5 epimer /3-D-glucuronic acid in water. Big savings 
in computational resources are gained in comparison to the standard parallel tempering method. 
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I. INTRODUCTION 

The field of glycobiology has undergone rapid growth 
since its name was coined in 1988^ to refer to the study 
of the molecular structure and biology of saccharides (or 
glycans) as they happen in the cell. Linear polysaccha- 
rides are ubiquitous in nature. Glycoproteins, proteogly- 
cans and glycolipids are the most common glycoconju- 
gates in mammalian cells (found mainly in the outer cell 
wall and fluids). In particular, glycosaminoglycans (hep- 
arin, heparan sulfate, chondroitin, keratin, dermatan, 
and many others) comprise a disaccharide unit formed 
by two aminosugars linked in an alternating fashion with 
uronic acids^. In this work we consider L-iduronic acid 
(IdoA), which is the major uronic acid component of the 
glycosaminoglycans dermatan sulfate and heparin^. We 
also consider its C5 epimer, D-glucuronic acid (GlcA), 
that dominates in chondroitin and hyaluronan. The IdoA 
is an uncommon molecule among the pyranoses (carbo- 
hydrates whose chemical structure is based upon a six- 
membered ring of five carbon atoms and one oxygen) 
because it adopts several conformations, and therefore it 
is believed to increase the flexibility and conformational 
freedom of the corresponding polysaccharide chains^. 

Computational studies of polysaccharides have evolved 
along several directions. In particular, considerable work 
has been devoted to quantum chemical studies of the 
energetics of relatively small compounds (see, for exam- 
ple, RefQ] and references therein). While very accurate, 
such ab-initio calculations typically cannot handle ex- 
plicit solvent over the nanosecond time scale. Yet, it is 
known that solvation plays a crucial role in the confor- 
mational preferences of carbohydrates^. Until recently, 
only coarse-grained models were able to breach the long 
time scales involved in conformational sampling. How- 
ever, methodological advances coupled to growing com- 
putational power have allowed the realization of all-atom, 
explicit solvent classical molecular dynamics over hun- 
dreds of nanoseconds^. This of course requires the devel- 
opment and improvement of force- fields 8 , which need to 



be validated by comparison with experiments. Reliable 
validation necessitates an accurate determination of the 
properties of the compounds in explicit solvent^, that can 
be computationally very demanding. 

Often, the dynamical aspect of the trajectory gener- 
ated in the course of a molecular dynamics (MD) simula- 
tion is not of interest per se, and the trajectory is used 
to study an ensemble statistics of some quantities. It 
was realized long time ago that MD is quite limited when 
employed for sampling purposes because the trajectory 
gets trapped in the vicinity of the free energy minima 
and is very unlikely to cross barriers higher than a few 
ksT. Many approaches to enhance the sampling have 
been proposed, with different measures of success. It is 
beyond the scope of this paper to review the different 
methods (the reader can find a review in Refi), but we 
will briefly go over the two methods employed in this 
work, namely the "replica exchange" method* - and the 
adaptively biased molecular dynamics method^. 

The idea behind the replica exchange method is to con- 
sider several MD trajectories that are generated using dif- 
ferent Hamiltonians and exchange those Hamiltonians (or 
trajectories) in a way that preserves detailed balance so 
that all the Boltzmann distributions that correspond to 
the Hamiltonians involved get sampled simultaneously in 
a mutually enhancing way. The overall performance of 
the method is determined by the "right" choice of the 
Hamiltonians involved in the construction. For exam- 
ple, "parallel tempering" (the most popular incarnation 
of the method) builds the family of Hamiltonians by re- 
scaling the original Hamiltonian. This is identical to 
the situation where one Hamiltonian is used at differ- 
ent temperatures - high-temperature replicas have higher 
barrier-crossing rates and continuously "seed" the low- 
temperature replicas with different configurations. The 
advantage of this method is that it is very general re- 
quiring little prior knowledge about the system. The dis- 
advantage is that, as the number of degrees of freedom 
increases, more and more replicas are required to cover 
the desired range of temperatures while maintaining sat- 
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isfactory exchange rates. In the pure parallel tempering 
case, the number of replicas has been shown to grow as 
the square root of the number of degrees of freedo m 12 ! 13 . 

Over the past few years, several methods targeting 
the computation of the free energy using non-equilibrium 
simulations have become popular. These methods all es- 
timate the free energy as a function of "collective vari- 
ables" from an "evolving" ensemble of realization o 14 ' 15 , 
and use that estimate to bias the system dynamics, so 
as to flatten the effective free energy surface. One of the 
first methods that used the idea of a dynamically evolving 
potential that forces the system to explore unexplored re- 
gions of the phase space was the local elevation method 16 . 
More recent developments include the adaptive force bias 
method^, the Wang-Landau approach^, and the non- 
equilibrium metadynamic s 19 ' 20 method. Collectively, all 
these methods may be considered as umbrella sampling 
methods in which the biasing force compensates for the 
free energy gradient. In the long time limit, the bias- 
ing potential eventually reproduces the previously un- 
known free energy surface. The adaptively biased molec- 
ular dynamics (ABMD) method^ grew out of our efforts 
to streamline the metadynamics method for classical 
biomolecular simulations 21 . It shares all the important 
characteristics of the above methods while including en- 
hancements in its own right such as a favorable 0(t) scal- 
ing with time t, and a fewer number of control parame- 
ters. Adaptive bias methods have been recently used to 
enhance the conformational sampling of small sugars in 
explicit water. These efforts include the use of the local 
elevation method to calculate the relative free energies 
and interconversion barriers of the ring conformers of 0- 
D-glucopyranose in water—, and the use of metadynam- 
ics to recover the conformational free energy surface of 
a-N-Acetylneuraminic acid 23 and /3-D-Glucopyranosc 24 . 
The latter work is an important contribution to the force- 
field validation process. 

It is possible to harness the power of replica exchange 
with a suitable family of Hamiltonians that are built 
by biasing the original Hamiltonian along relevant slow 
modes. These biasing potentials can be readily computed 
using the adaptive bias methods. In this paper, we com- 
bine the ABMD method with the replica exchange and the 
weighted histogram analysis (WHA M 25 ' 26 ) methods in order 
to achieve robust sampling of the conformational space 
of the pyranose ring in explicit solvent, and recover the 
potential of mean force (free energy) associated with two 
intra-ring dihedral angles describing the slow puckering 
modes. The free energy alone could be computed by 
"correcting" the biasing potential obtained by ABMD with 
an equilibrium run, as discussed at length in Refl2ll. The 
approach explored in this paper replaces the "correction" 
step with a replica-exchange simulation that both "cor- 
rects" the approximate biasing potential and efficiently 
samples unbiased equilibrium configurations. 

Our study has two main goals. First, we present a sim- 
ulation protocol that allows for efficient sampling of the 
degrees of freedom of a solute in explicit solvent. Second, 



we illustrate how this approach can be used to validate 
the new GLYCAM 06 force field^ by means of applying it 
to the study of the puckering landscapes of the methyl 
glycosides of a-L-iduronic acid and its C5 epimer /3-D- 
glucuronic acid (see FigfT]). 

The paper is organized as follows: the methodology 
is described in the next section which is followed by the 
presentation of the simulation protocol along with the 
results. The paper then ends with a short summary. 

II. METHODS 

Let us consider a system of N classical particles. The 
state of the system can be described by the positions of 
the particles ri, . . . , rjv and their momenta px, . . . ,Pjv- 
Let 

JV 2 

^ = E^ + <i> ( ri '--->^) (!) 

i 2m a 

a— 1 

be the Hamiltonian of the system. Here, m a are the 
masses of the particles and $(ri, . . . , r^r) is the potential 
energy. In the following, we omit the atomic indices if this 
does not lead to confusion, that is, r represents ri, . . . , rjv 
and p represents pi, . . . , pjy- 

A routine task in molecular modeling involves sampling 
configurations according to the canonical distribution 

P(p,r)oc e -^(P' r )/ fc ^ (2) 

(&b is the Boltzmann constant and T is the temperature). 
One way of carrying out such sampling is, for instance, to 
generate dynamics via the Langevin equation such that 
its limiting probability distribution is the desired equi- 
librium distribution- 2 ^. Unfortunately this basic strategy 
often fails because the system gets either trapped in the 
neighborhood of a potential energy minimum or locked in 
some region of phase space due to entropic bottlenecks. 
Hence, other more elaborate approaches have been pro- 
posed throughout the years (see, for example, a review 
in Ref|9|). In this paper we describe how a combination 
of the replica exchange^ -, ABMD 1 ^ and WHA M 25 ' 26 methods 
can be used to enhance sampling, and illustrate the use 
of this protocol by applying it to the study of small so- 
lutes in explicit solvent. For clarity and completion, we 
briefly review these methods below. 

A. Replica exchange method 

The replica exchange idea 1 ^ consists of considering sev- 
eral Hamiltonians, % a (p,r), a = 1,...,M (with the 
original H(p,r) being of one them), running the dy- 
namics for each of them, and trying to "cross-pollinate" 
these dynamics by exchanging the trajectories between 
the different Hamiltonians from time to time. More for- 
mally, the method can be implemented as follows: once 
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in a while a random pair of replicas is chosen and an 
exchange is attempted with the acceptance probability 
carefully tuned to maintain detailed balance in the ex- 
tended ensemble 10,2 - 
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1 

ksT b 



1, A<0, 
e" A , A>0, 



[4> a (r") - $ Q (r a )] 



(3) 



where T a ^ denote the temperatures of the replicas and 
only the potential energy is shown (the momenta are 
integrated out). In practice, the momenta get simply 
rescaled upon a successful exchange. In this scheme, 
high, inaccessible barriers in one replica can be smaller 
in another replica and hence this random walk over the 
replicas facilitates barrier crossings. Care must be taken 
with the choice of Hamiltonians since it determines the 
performance of the method. There are many variations 
of replica exchange, with the most popular one being 
"parallel tempering" : the family of the Hamiltonians is 
constructed from the rescaled copies of the original one 
(this is equivalent to having the same Hamiltonian with 
different temperatures). Parallel tempering has the ad- 
vantage of being quite general, but the applicability of 
the method can be limited by the fact that the number 
of replicas needed to cover a desired temperature range 
(while maintaining an efficient random walk across those 
temperatures) grows as the square root of the number of 
degrees of freedo m 12 i 13 . 



In the last equation G(£) denotes the kernel, which is 
thought to be positive and symmetric (in analogy to the 
kernel density estimator widely used in statistics^ -) . For 
large enough Tp (flooding timescale) and small enough 
kernel width, the biasing potential U (£, t) converges to- 
wards — /(£) as t — > oo^i£. 

The implementation of the method is straightforward: 
we use cubic B-splines^i (or products of thereof for mul- 
tidimensional collective variables) to discretize U(£,i) in 
"space" : 

U(t,t) = Y,U m (t)B(t/A{-m), 



(2-|£l) 3 /6, 1<|£|<2, 
B(0={ em~2)/2 + 2/3, 0<|C|<1, (4) 
0, otherwise. 

We choose the biweight kernel^ - for G(£): 



G(0 



_ 48 r (i-£2/ 4 ) 2 , -2 < e < 2 



41 I 0, 



otherwise 



and Euler-like discretization scheme for the time evolu- 
tion of the biasing potential: 

k T 

U m (t + At) = U m (t) + At-2—G\a/A£ - ml , 

TF 

where a = o~(r) is at time t. This discretization in time 
may readily be improved. However, this is not really 
important, since the goal is simply to flatten U (£, £)+/(£) 
in the long time limit. 



B. Adaptively biased molecular dynamics 



C. Combination of replica exchange and ABMD 



The ABMDii method is an umbrella sampling method 
with a time-dependent biasing potential that can be used 
to compute the Landau free energy of a collective variable 
er(r), which is understood to be a smooth function of the 
atomic positions. The Landau free energy is defined as 29 : 

f(0 = -k B Tky(S[Z-o-(v)}), 

where the angular brackets denote an ensemble average. 
The free energy /(£) (sometimes also referred to as the 
potential of mean force) describes the relative stability 
of states corresponding to different values of £, and can 
provide insight into the possible transitions between these 
states. 

In the ABMD method an additional time-dependent bi- 
asing potential is added to the potential energy of the 
system 

*ABM D (r,t) =$(t) + U[<t (r),t]. 

This biasing potential "floods" the true free energy land- 
scape as it evolves in time according to: 

dU&t) k B T 



dt 



TF 



-G[e-a(r)]. 



In this paper the ABMD method is used to construct the 
Hamiltonians for the replica exchange scheme. In par- 
ticular, we arc interested in the properties of a relatively 
small solute, with a few degrees of freedom, immersed 
in a much larger "bath" of solvent molecules. Parallel 
tempering is impractical for such a system due to the 
large total number of degrees of freedom, requiring many 
replicas that spend most of the time equilibrating the 
solvent at different temperatures. On the other hand, 
especially for small solutes, one can readily identify the 
slowest modes that are difficult to sample in regular MD on 
the grounds of physical and/or chemical intuition. Quite 
often those modes can be described as a motion along 
a collective variable a = a{v\, . . . , v£) that does not de- 
pend on the solvent coordinates explicitly. It is therefore 
reasonable to consider two (or more - see below) replicas, 
both running at the same temperature, with the poten- 
tial energies 

*„(r) = $(r), $ b (r) = $(r) - f[a(r u . . . ,r s )] , 

where the last term in ^(r) is the Landau free energy 
associated with the collective variable a computed for the 
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original potential energy $(r). This last term renders the 
states with the different values of the collective variable 
equally probably in the second replica, which thus plays 
the role of the high-temperature replicas in the parallel 
tempering scheme, "speeding" up the motions along the 
collective variable. The first replica uses the unmodified 
potential energy, thus producing unbiased samples of the 
configurations of the original system. Since both replicas 
run at the same temperature, the exchange probability 
given in Eq.Q does not depend explicitly on the poten- 
tial energy difference between the replicas, but on the 
biasing potentials as a function of the solute degrees of 
freedom. Therefore, the necessary number of replicas is 
expected to be much smaller than it would be for parallel 
tempering. 

The true free energy, /(£), is typically unknown in 
advance. We therefore use the ABMD method to com- 
pute a biasing potential U(^) that approximately com- 
pensates for the unknown free energy /(£) fs —£/(£) (up 
to an additive constant). This biasing potential is then 
used to set up the replica exchange scheme that yields 
both an enhanced sampling of the solute states and a 
more accurate Landau free energy /(£) associated with 
a(ri, . . . ,rs). The latter is reconstructed using the fa- 
miliar WHAM approac h 25 ' 26 briefly summarized in the next 
subsection. 

The overall success of the replica exchange scheme is 
determined by the efficiency of the random walk in the 
replica space. The latter can be quantified by the ex- 
change rates between the pairs of replicas. Let us con- 
sider two replicas running at the same temperature T 
with the potential energies 

$ a ,b( T ) = $(r) +Ua,b[u{T 1 ,...,T S )\, 

given by the original potential energy $(r) biased by 
the potentials U a ,b(£) acting on the collective variable 
a(ri, . . . , rs), which is the same in both replicas. The 
exchange probability Eq.© then becomes 

7> ab = e(A) e - A + e(-A), 

where Q(x) is the Heaviside step function, 

and £ 0) & are the values of the collective variable 
<j(ti, . . . , rs) in the corresponding replicas. The proba- 
bility densities of the collective variables can be expressed 
through the associated Landau free energy, /(£), com- 
puted for the original potential $(r): 

and can readily be used in the expression for the rate of 
exchange between the two replicas 



The integrals in the above formula are low-dimensional 
and can easily be computed numerically. 

From a practical point of view, the two replicas used in 
the preceding discussion often render the exchange rate 
unacceptably low. This can be solved by introducing 
more replicas, such that the range of biasing potentials 
goes from zero to the full (approximate) negative free en- 
ergy through some intermediates. For the present study 
we chose to use three replicas biased as follows: 

$i(r) = $(r), 

$ 2 (r) = $(r) + AC/[a(r 1 ,...,r s )], 
$ 3 (r) = $(r)+ U[o-(n,...,rs)], 

where the parameter A is found by imposing equality of 
exchange rates, TZ\2 = TZ-23- This equation is solved nu- 
merically. The resulting exchange rates turned out to be 
around 30% and were deemed satisfactory. Otherwise, 
more intermediate replicas would have to be introduced 
following the same reasoning. 

D. Weighted histogram analysis method 

Here we review the WHAM 25,2 - scheme, following the pre- 
sentation given in Refsf32l-[3~il The aim of the WHAM ap- 
proach is to infer the unbiased free energy out of several 
biased simulations. More precisely, let us consider M 
simulations biased by some potentials U m , m = 1, . . . ,M 
that depend on atomic positions through a collective vari- 
able cr(r), that is, 

U m = t/ TO [<r(r !,...,!>-)]. 

Let K m denote the number of values of the col- 
lective variable a collected in the m th simulation: 
6n, i, £ m , 2, ■ • • , £m,K m (with TO = 1, . . . , M) . By the 
end of the simulations, the whole sample T> contains 
K\ x K2 x • • • x Km values of the collective variable that 
are supposed to be statistically independent. 

The likelihood 1 to observe a single value of the collec- 
tive variable in the m th simulation is 

p(£\m) = ^- fdrS[^a(r)]e-^n 7 (5) 
where /3 = l/k B T, $,„ = $ + U m and 

The likelihood ((5|) can be expressed through the Landau 
free energy /(£) (which is to be determined) associated 
with the collective variable 



Tlab= d£, a d£ b Pa(£a)Pb{£,b)'Pab- 



1 Likelihood, p(A\B), is conditional probability of A given B. 



5 



\ ^COOH \ 



where 



and 



/™(fl = /(0 + MO. 

The likelihood of the whole dataset I? is 

i(/i©)= n 1 1 / ^ - 

m—l j — 1 

By introducing individual "densities" 

r m (Z) = K- 11 £ i 6(z-£ J ,m), 

3=1 

one can rewrite the likelihood ([BJ as follows 
-In L(f\V) = 

M 



m—l 



The maximum likelihood principle 
5 



L(f\V) = 



then leads to the celebrated WHAM equations 26 
e -j8/(0 



M 
m—l 



M 

J2 w me~ 

m—l 



where 



K„ 



M 
m—l 



(6) 



( 7 ) 



(8) 



(9) 



(10) 



In practic o 25 i 26 ' 35 , people typically use histograms to ap- 
proximate the densities Eq.([7J and then solve the WHAM 
equations Eq. ([10[) to find the weights r) m iteratively. 

Another possibility, which is exploited in this paper, 
is to maximize the likelihood with respect to the un- 
known free energy /(£) numerically. There are some is- 
sues with this approach too. First, the method of max- 
imum likelihood is not very useful if applied directly to 
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FIG. 1: Structure of the methyl glycosides of a-L-iduronic 
acid (IdoA, left) and its C5 epimer /3-D-glucuronic acid (GlcA, 
right). 



the likelihood given by Eq.© because the functional 
is maximized by a sum of delta function spikes at the 
observations^ .. The likelihood should be maximized over 
a particular class of functions instead. Technically this is 
typically implemented by adding a penalty term to the 
likelihoo d 36 i 37 . While theoretically interesting, in this 
paper we overcome the problem in a similar way that 
WHAM does: by smoothening the "densities" ([7}. Unlike 
the classical WHAM papers, where the densities are approx- 
imated by piece- wise constants (histograms), we use the 
so-called kernel density estimator for the biased densi- 
ties ([7]). That is, we replace r m (£) by r m (£) which is 
obtained from the former by using Gaussian kernels 



0(0 = 



1 



hy/2n 



exp 



(in 



in place of the ^-functions. 

Another issue with the functional (j8]) is that it is invari- 
ant with respect to the shifts of the free energy /(£) by 
a constant. While perfectly physical, this feature needs 
special attention in a numerical setting and hence it is 
desirable to get rid of it before hand. To this end, we 
consider the following functional 30,37 in place of the orig- 
inal one: 



L = e wm + 



■ wmVm 



(12) 



M . 



m—l 



It is easy to see that the unconstrained minimum of the 
L functional coincides with a constrained maximum of 
the original likelihood ([8]). The functional (TT2|) is thus 
much more convenient for numerical treatment since the 
constrained optimization problem gets reduced to an un- 
constrained one. 

Technically, we represent the unknown free energy /(£) 
in the cubic B-splines 3 ^ basis, just as we do with the bi- 
asing potential U (£) . The functional L then becomes 
a function of the coefficients. Finally, we minimize this 
function to find the unknown coefficients numerically us- 
ing the L-BFGS algorithmic^. 
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III. a-L-IDURONIC AND /3-D-GLUCURONIC 
METHYL GLYCOSIDES 

We consider methyl glycosides of the a-L-iduronic acid 
(IdoA), the major uronic acid component of the gly- 
cosaminoglycans dermatan sulfate and heparin^, and its 
C5 epimer, /3-D-glucuronic acid, predominant in heparan 
sulfate (FigfTJ). The IdoA residue is special among pyra- 
noses because it can adopt several conformations, thus 
giving flexibility and conformational freedom to the cor- 
responding polysaccharide chains 3 -. We concentrate on 
the conformational sampling of these molecules as param- 
eterized by the GLYCAM 06^ force-field in explicit TIP3P 40 
water. 

There are fourteen "canonical" puckering states of the 
pyranose ring that minimize the angle strain: two chairs, 
six boats and six skew-boats (see Refl4ll for a more com- 
prehensive presentation). The 4 Ci and 1 C , 4 chair confor- 
mations, the 2 Sq and 1 S , 3 skew-boats, and the nomen- 
clature for the ring carbon atoms, are sketched in FigJ5] 
The puckering states are named with a C, B or S let- 
ter for chair, boat or skew-boat respectively. The letter 
is then superscripted/subscripted with the number (s) of 
the atom(s) above/below the reference plane (the super- 
scripts are placed in front of the letter while the sub- 
scripts go after it). The reference plane is chosen to con- 
tain four of the ring atoms and if ambiguity occurs (in 
the chair and skew-boat conformers), the plane is chosen 
so that the lowest numbered carbon atom is out of this 
plane. 

For the majority of pyranoses, the 4 Ci chair conforma- 
tion is the only relevant one with the others being much 
higher in energy. The IdoA is an exception for which both 
4 Ci and 1 C*4 chairs have been observed 3 along with the 
2 <So skew-boat. Given the special place of the iduronic 
acid among the rest of the pyranoses, it is important to 
be able to accurately reproduce its conformational equi- 
librium in molecular dynamics simulations. However, the 
barriers separating the two chair conformations are of or- 
der of at least « lOtsT 7 *^, and thus the corresponding 
transitions are rather unlikely on the 10-100 nanosecond 
timescale. In particular, in Refl42l. the authors conclude: 
"The frequency of ring conformation interchange ... is 
too slow to allow the accurate prediction of the propor- 
tion of each form from molecular dynamics alone" . The 
combination of the methods described in the preceding 
section allows us to overcome this difficulty 



IV. RESULTS 

We use the intra-ring 05-C1-C2-C3 and C3-C4-C5- 
05 torsion angles (denoted a.\ and o>2 in the following) 
to distinguish among the different conformations. Our 
choice is not unique, but this is of minor importance 
since the dihedrals adopt distinctly different values for 
the conformations in question: the values of the angles 
are of different sign for the chairs (with 1 C I 4 correspond- 



ing to a x w -50°, a 2 « +50° and 4 Ci having a x w +50°, 
a 2 Ri —50°) and of the same sign for the boats and skew- 
boat conformations. 

Our goal is to sample the equilibrium configurations of 
the rings in the NTV ensemble using explicit solvent. We 
proceed in stages as follows: (1) compute with ABMD an 
approximate free energy for the dihedral angles using a 
relatively cheap and fast implicit solvent model; (2) intro- 
duce explicit solvent and "refine" the potential of mean 
force (PMF) from the previous step; (3) set up a replica ex- 
change scheme that uses the approximate PMF to sample 
from an unbiased canonical distribution while collecting 
the biased samples as well. The trajectories from the 
last step are then used to study equilibrium properties of 
the rings and to recover the accurate free energies using 
the WHAM technique. The general philosophy behind the 
protocol is to start with coarser, considerably cheaper 
approximations that are successively refined with more 
accurate, expensive approaches such that the final results 
have the accuracy of the expensive approach at a fraction 
of the cost. Thus we start with implicit solvent and then 
use these results as the starting point for explicit solvent. 

Let us describe the simulation protocol step by step. 
The simulation technical details can be found in Ap- 
pendix |XJ 



A. Step 1: flooding with implicit water 

We start sampling in implicit water. A short run at 
T = 3Q0K revealed that the angles fluctuate with an am- 
plitude of w 10 degrees and a characteristic time (taken 
to be the period of oscillations here) of 0.5ps. We thus 
performed a flooding ABMD run with 4A£ = 17° (for both 
torsional angles) and tf = lps. We monitored the tra- 
jectory and continued running until both angles visited 
all the values between m —80° and « +80° (that is, un- 
til the minima corresponding to both the C\ and 1 Ci 
chair conformations and skew-boat configurations were 
"flooded"). The coarse stage took Ins. We then de- 
creased the spatial resolution to 4A£ = 11° and increased 
the timescale to Tp = 50ps and continued the flooding to 
refine the biasing potential obtained in the previous step. 
It took 3ns more of simulation time to cover the desired 
values of ai 2< The biasing potential at this stage is ex- 
pected to provide a relatively good approximation of the 
corresponding free energy. (It is not necessary to com- 
pute the error at this stage since this potential is only an 
approximation to be used in the next step). 

Regarding the choice of the "spatial" resolution 4A£, 
we have to balance two conflicting factors: a smaller res- 
olution allows the identification of the finer details of the 
free energy; yet for a smaller 4A£ (and fixed flooding rate 
Tp), the accuracy of the free energy decreases because 
the system experiences stronger biasing forces pushing it 
out of the thermodynamic equilibrium. We thus choose 
the resolution to be sufficiently small to approximate the 
equilibrium minima well. 



C4 



,OCH 3 

HO ^ 3 




FIG. 2: Ring carbon atom numbering (ring oxygen atom 05 is the atom number zero) and some conformers of the pyranoid 
ring: 4 Ci, 2 S , l C 4 and 1 5s (from left to right). 
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FIG. 3: Free energies (kcal/mol) associated with the 05-C1-C2-C3 (qi) and C3-C4-C5-05 (0:2) dihedral angles for the IdoA 
(left) and GlcA (right) methyl glycosides (shown in FigJTJ in aqueous solution (TIP3P water). 



B. Step 2: flooding with explicit water 



It is reasonable to expect that the biasing potential 
obtained with implicit solvent provides a good approx- 
imation for the free energy in explicit solvent. To ac- 
count for the differences between the two solvent de- 
scriptions, we carried out more flooding in explicit sol- 
vent with t f = 50ps and 4A£ = 11° (i.e., the "fine" 
values used for the implicit solvent simulation). After 
just 5ns, every value of the angles was visited and so we 
stopped the flooding. Next, we performed a 5ns equilib- 
rium MD run biased by the biasing potential computed so 
far. The biased trajectory visited all the values in the 
interval —80° < a.\ 2 < +80°. This means that the bias- 
ing potential approximates the true free energy surface 
to within a few ksT (otherwise the trajectory would get 



trapped somewhere; the fact that sweep the whole 
region implies that the error is of order of a few ksT at 
most). 



C. Step 3: replica exchange with explicit water 

Finally, we set up a replica exchange scheme that sam- 
ples both biased and unbiased configurations. This allows 
for: (a) the computation of the accurate free energy as- 
sociated with the ring puckering states, and (b) the com- 
putation of the unbiased equilibrium averages, such as 
J-couplings, which are of particular interest since they 
allow for a direct comparison with the NMR data. This 
last aspect is of paramount importance for the force-field 
development. 
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FIG. 4: The (•&, ip) angles of pseudo-rotation^ for the IdoA (left) and GlcA (right) methyl glycosides in aqueous solution (TIP3P 
water) at T = 300K (using the Hammer- Aitofi^i equal-area projection). 



As we described in the "Methods" section, we put three 
replicas together: the first replica is not biased, the sec- 
ond one is biased by AC/(£), and the third one is biased 
by the full £/(£). We determined the value of A = 0.7 
numerically solving the equation IZ12 = TZ-23- After a 
short Ins equilibration, we went into "production runs" 
for 100ns trying to exchange a random pair of replicas 
every lps. The observed exchange rates of w 30% were 
in perfect agreement with the expected values (computed 
assuming that /(£) = —U(£)) for both IdoA and GlcA 
cases. 

We emphasize that the simulations in explicit sol- 
vent involve jumps over barriers of order of 6.5kcal/mol 
(« lOksT). In order for conventional parallel tempering 
to be able to sample this system, the highest temperature 
replica would have to run at ~ 10 x 300A'. This is a pro- 
hibitively large number since several hundreds replicas 
running at intermediate temperatures would be needed 
to proxy between the highest and the lowest (room) tem- 
peratures. 

D. Accurate free energies using WHAM 

Since the autocorrelation time for the angles in a fully 
biased MD run is approximately lOps, we regarded the 
samples taken lOps apart during the equilibrium replica 
exchange as statistically "independent". Therefore we 
saved the coordinates from the unbiased replica every 
lOps to compute various equilibrium properties of the 
IdoA and GlcA, reported in the subsection below. We 
used the whole sample from all three replicas and the 
WHAM scheme to recover the accurate free energy, shown 
in FigH 

A crucial part of this step is the selection of the kernel 
bandwidth h in Eq.Qlip. Assuming that the data is sta- 
tistically independent, we computed optimal h by mini- 
mizing the least-squares cross-validation score to get h « 
1.1° for all the runs; we also computed the bandwidths 
by maximizing the likelihood cross-validation score get- 
ting h rj 2.9° for all the runs. The values selected by 
both methods were checked using the Silverman's "vi- 
sual" method and appeared to be too small (a problem 
described, for example, in Refflfjl). Combining all the 



hints we chose h = 5.73°, 8.59° and 11.46° for the unbi- 
ased, intermediate and fully biased replicas, respectively. 

E. Equilibrium properties of the compounds 

We computed the Cremer-Pople puckering 
parameters 4 ^ regarding the 05 oxygen as the ring 
apex (atom number one in the Cremer-Pople formulae) 
and the carbon with the glycosidic linkage (CI) as the 
atom number two. The mean value of the puckering 
amplitude Q turned out to be 0.53A and 0.55A (for IdoA 
and GlcA respectively) with standard deviation ps 0.04A 
for both compounds. Both ?9 and ip pseudo-rotation 
angles display somewhat larger variations and hence are 
shown in FigH] using the Hammer- Aitofl^ 4 - equal-area 
projection: 

,„ , / 2%/2sintfshW2 \/2cosz9 \ 

(1?, ip) -> , — ^^=^^^= . 

V \J\ + sin cost/?/ 2 y]_ + sini?cosi/?/2y 

The ip) values are consistent with the free energies 
associated with the ax,% (see Figj3]): there are two well- 
pronounced clusters near the poles for IdoA correspond- 
ing to the 4 Ci (north pole, 1? = 0°) and 1 Ca (south pole, 
■d = 180°) chair conformations, and just one big clus- 
ter near the north pole ( 4 Ci chair) for the GlcA. The 
boat and twisted boat conformations correspond to the 
equatorial area ($ = 0°) and are indeed considerably 
less frequent than the chairs. For IdoA the majority of 
the "equatorial" samples is around ip — 150° ( 2 Sq skew- 
boat), yet several other twisted boat conformations are 
also present. This is in accord with Ref|H where it has 
been suggested that not only "canonical" boat and skew- 
boat conformations, but also various boat-like intermedi- 
ates, contribute to the conformational equilibrium. The 
most frequent skew-boat conformation for GlcA observed 
to be 1 S 3 (f around -150°). 

In order to identify the pyranose puckering states ex- 
plored during the simulations, one has to choose a cri- 
terion for assigning a particular trajectory section to a 
canonical puckering state. When the temperature is zero, 
the potential energy minima typically define the confor- 
mations in a unique and natural way. At non-zero tern- 
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perature, however, some ambiguity is always present due 
to the thermal fluctuations. In our simulations, the C\ 
and 1 C4 chair conformations are clearly discriminated by 
the signs of the ai,2 angles and hence we assume that the 
equilibrium samples with (ai > 0, «2 < 0) are 4 Gi chairs 
while those with (a\ < 0,«2 > 0) are 1 Ca chairs. The 
boat and skew-boat conformations, on the other hand, 
correspond to angles of the same sign, yet it is difficult 
to distinguish among the different boat and skew-boat 
types using the ai j2 angles alone. The Cremer-Pople pa- 
rameters could potentially discriminate among the boats, 
but the ambiguity still persists as one would have to set 
the boundaries between the conformations. Thus, in the 
following we concentrate on the properties of the chair 
conformations, although our simulations show that dif- 
ferent skew-boat conformations show up at the thermo- 
dynamic equilibrium. In the case of IdoA the majority of 
those are 2 5o-like (left panel in Fig Q] dots in the equa- 
torial area with ip around 150°) and 3 >Si-like (tp around 
30°). This is in agreement with the conclusions of Mul- 
loy and Forster— that the 4 Gi <H> 4 G4 transformation of 
IdoA proceeds through the 2 So and 3 Si intermediates. 

We were able to compute the free energy differences 
between the 4 Ci and states by counting the number 
of configurations N{ot\, a 2 ) with the appropriate signs of 
the ai t 2 angles: 



AG 



-fcsTTn 



-kfiThi 



p( 4 Ci) 

N(ai > 0, a 2 < 0) 
N\ax < 0,a 2 > 0)' 



where p( A C\) and pQ^Cn) denote the probabilities of the 
corresponding conformations. We obtained 



AG 



IdoA 



0.71kcal/mol, AG° C 



GlcA 



-4.49kcal/mol. 



In other words, 1 Cn turns out to be marginally more sta- 
ble than 4 Gi for IdoA, while for GlcA, 4 Gi is considerably 
more stable than 1 d. The preference of 4 G4 over 4 Gi for 
IdoA appears to be largely due to solvation effects - the 
vacuum energies optimized in the GLYCAM 06 force-field 
differ merely by 0.2kcal/mol. On the other hand, the 4 Gi 
has much lower energy than for the GlcA in the gas 
phase and thus the 4 Gi stability in water is more due to 
the "internal" energetics than to solvent interactions or 
temperature. 

Finally, in order to make connection with NMR spectral 
data 3 -, we computed the J-couplings for these molecules. 
Also known as indirect dipole-dipole coupling, a J- 
coupling is the coupling between two nuclear spins caused 
by bonding electrons acting on the magnetic field through 
the two nuclei. J-couplings can be linked to the dihedral 
angles via the Karplus equation 48 : 



3 J HH = A cos 2 ip + B cos ip + C, 



(13) 
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We used the values of the parameters reported in Ref|49j. 
The 3 Jff#-coupling constants are presented in FigJSJ We 



TABLE I: Mean value and standard deviation (in parenthesis) 
of the intra-ring bond and dihedral angles for IdoA and GlcA 
methyl glycosides in 4 Ci and 1 d conformations (degrees). 



have carried out a considerably accurate sampling of 
phase space for these molecules, but naturally the validity 
of the final numbers depend on the validity of the force- 
field and is also limited by the accuracy of the Karplus 
equation Eq . (fl3|l . The distribution of the 3 Jhh values 
for IdoA clearly reflects the contributions from the two 
chair conformations, yet the mean values do not match 
the ones reported in Ref0 exactly. Specifically, the mean 
values of the 3 Jhh for the methyl-a-L-Iduronic acid re- 
ported in Ref[|| (line 1, Table I in that reference) are 
Ji, 2 = 4.9, J 2 , 3 = 6.6, J 3A = 6.0 and J 4j5 = 4.0 Hz 
implying that C\ is favored over ^4 contrary to our 
results. We also note that, due to the bimodal nature of 
the IdoA's 3 Jhh distributions (see FigJSJ, the discrep- 
ancies of the mean values can be deceptive. Unfortu- 
nately, while there are ample experimental data on oligo- 
and polysaccharides, information about their monosac- 
charide units is sketchy and scattered across the scientific 
literature^, which prevents more rigorous comparisons. 



V. SUMMARY 

In this work we carried out simulations of methyl-a-L- 
iduronic and methyl-/3-D-glucuronic acids parametrized 
by the GLYCAM 06 force field in explicit TIP3P water. The 
importance of these small sugars is greatly due to the 
fact that IdoA is the major uronic acid component of the 
glycosaminoglycans dermatan sulfate and heparin, while 
GlcA dominates in heparan sulfate. The main purpose 
of this work was to introduce a procedure that reliably 
recovers the free energy of the solvated molecules and 
samples the solute degrees of freedom in explicit solvent 
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not depend explicitly on the potential energy difference 
between the replicas; it depends, instead, on the differ- 
ences of the biasing potentials, which are functions of the 
solute degrees of freedom only. Therefore, the necessary 
number of replicas is considerably smaller (three repli- 
cas were enough for the pyranose rings considered in this 
paper). 

Once that accurate sampling is not an issue anymore, 
one can really concentrate on problems associated with 
the validity of the force-field representation. In particu- 
lar, one can compare against experimentally measurable 
quantities, such as J-couplings, with the certitude that 
any deviations from experimental values are due to inac- 
curacies of the force-field (of course, assuming that the 
experimental values are correct). Naturally, one can ex- 
tend this study to take into account the effects of different 
densities, salt concentrations, temperature, etc. 
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Appendix A: Simulation details. 



FIG. 5: Distributions of the H-H coupling constants computed 
using the Karplus equation for IdoA (left) and GlcA (right). 
Mean value shown in the insets. 

using very moderate computational resources. 

Succinctly, the protocol involves the following steps: 
(i) use ABMD to compute an approximate free energy for 
the intra-ring dihedral angles in a relatively cheap im- 
plicit solvent; (ii) refine the free energy with ABMD under 
explicit solvent; (iii) use this last free energy to set up a 
Hamiltonian replica exchange scheme that samples both 
from biased and unbiased distributions; and (iv) recover 
the accurate free energies via the WHAM technique applied 
to all the replicas, and compute equilibrium properties of 
the rings from the unbiased trajectories. 

The protocol appears to be slightly more complex than 
the standard parallel tempering method. However, big 
savings in computational resources fully compensate for 
this extra "complication" . The simulations in explicit 
solvent reported in this paper involve jumps over barri- 
ers of order of ~ 10/cbT. An efficient parallel tempering 
setup in this case would have to run the highest temper- 
ature replica at ss 10 x 300K. This rather high temper- 
ature would in turn require several hundred replicas to 
mediate between this highest and the room temperature. 
In our scheme all replicas run at the same temperature, 
and hence the exchange probability given in Eq.© does 



The simulations presented in this paper were per- 
formed using the AMBER lO 5 -^ simulation package. The 
LEaP program from that package was used to prepare 
the initial structures along with the topology /parameter 
files. The GLYCAM 06^ force field was used for the solutes. 

The implicit solvent simulations were carried out using 
the generalized Born model 51,52 , including surface area 
contribution computed using the LCP0 models with the 
surface tension set to 5 x 10~ 3 kcal/mol/A 2 . The GB/SA 
simulations were performed enforcing no cutoff on the 
non-bonded (van der Waals and electrostatics) interac- 
tions. 

The TIP3P water model^ was used for the explicit 
solvent simulations (in each case 645 water molecules 
were added along with a Na + ion to neutralize the sys- 
tem). For the explicit water simulations, we used peri- 
odic boundary conditions with fixed box size (27A in all 
three directions) chosen to render the density equal to 
w lg/cm 3 . The Particle Mesh Ewald (PME)^i^ method 
was used with the direct space cutoff set to 9A and a 
32 x 32 x 32 grid for the smooth part of the Ewald sum. 

The lengths of all bonds that contain hydrogen were 
fixed via the SHAKE algorithm with the tolerance set 
to 10 _6 A. Langevin dynamics^ with collision frequency 
7 = lps -1 was used to keep the temperature at 300K. A 
different random number generator seed was set for every 
run. 



11 



1 T. W. Rademacher, R. B. Parekh, and R. A. Dwek, Annu. 
Rev. Biochem. 57, 785 (1988). 

2 A. Varki, R. Cummings, J. Esko, H. Freeze, G. Hart, and 
J. Marth, eds., Essentials of Glycobiology (Cold Spring 
Harbor Laboratory Press, New York, 2009). 

3 D. R. Ferro, A. Provasoli, M. Ragazzi, B. Casu, G. Torri, 
V. Bossennec, B. Perly, P. Sinay, M. Petitou, and J. Choay, 
Carbohydrate Research 195, 157 (1990). 

4 V. S. R. Rao, P. K. Qasba, P. V. Balaji, and R. Chan- 
drasekaran, Conformation of Carbohydrates (Harwood 
Academic Publishers, Amsterdam, 1998). 

5 K. N. Kirschner and R. J. Woods, Proc. Natl. Acad. Sci. 
98, 10541 (2001). 

6 A. Almond and J. K. Sheehan, Glycobiology 13, 255 
(2003). 

7 V. Krautler, M. Miiller, and P. H. Hiinenberger, Carbohy- 
drate Research 342, 2097 (2007). 

8 K. N. Kirschner, A. B. Yongye, S. M. Tschampel, 
J. Gonzalez-Outeirino, C. R. Daniels, B. L. Foley, and R. J. 
Woods, J. Comp. Chem. 29, 622 (2008). 

9 M. Christen and W. F. van Gunsteren, J. Comp. Chem. 
29, 157 (2008). 

10 C. J. Geyer, in Computing Science and Statistics: The 23rd 
symposium on the interface (Interface Foundation, Fairfax, 
1991), pp. 156 - 163. 

11 V. Babin, C. Roland, and C. Sagui, J. Chem. Phys. 128, 
134101 (pages 7) (2008). 

12 D. A. Kofke, J. Chem. Phys. 117, 6911 (2002). 

13 D. A. Kofke, J. Chem. Phys. 120, 10852 (2004). 

14 T. Lelievre, M. Rousset, and G. Stoltz, J. Chem. Phys. 
126, 134111 (2007). 

15 G. Bussi, A. Laio, and M. Parrinello, Phys. Rev. Lett. 96, 
090601 (2006). 

16 T. Huber, A. E. Torda, and W. F. van Gunsteren, J. Corn- 
put. Aided. Mol. Des. 8, 695 (1994). 

17 E. Darve and A. Pohorille, J. Chem. Phys. 115, 9169 
(2001). 

18 F. Wang and D. P. Landau, Phys. Rev. Lett. 115, 2050 

(2001) . 

19 A. Laio and M. Parrinello, Proc. Natl. Acad. Sci. 99, 12562 

(2002) . 

20 M. Iannuzzi, A. Laio, and M. Parrinello, Phys. Rev. Lett. 
90, 238302 (2003). 

21 V. Babin, C. Roland, T. A. Darden, and C. Sagui, J. Chem. 
Phys. 125, 204909 (2006). 

22 H. S. Hansen and P. H. Hiinenberger, J. Comp. Chem. 
9999, e (2009). 

23 V. Spiwok and I. Tvaroska, J. Phys. Chem. B 113, 9589 
(2009). 

24 V. Spiwok, B. Kralov; and I. Tvaroska, Carbohydrate Re- 
search 345, 530 (2010). 

25 A. M. Ferrenberg and R. H. Swendsen, Phys. Rev. Lett. 
63, 1195 (1989). 

26 S. Kumar, J. M. Rosenberg, D. Bouzida, R. H. Swendsen, 
and P. A. Kollman, J. Comp. Chem. 13, 1011 (1992). 

27 A. Briinger, C. L. Brooks, and M. Karplus, Chem. Phys. 
Lett. 105, 495 (1984), ISSN 0009-2614. 



28 Y. Sugita, A. Kitao, and Y. Okamoto, J. Chem. Phys. 113, 
6042 (2000). 

29 D. Frcnkcl and B. Smit, Understanding Molecular Sim- 
ulation, Computational Science Series (Academic Press, 
2002). 

30 B. W. Silverman, Density Estimation for Statistics and 
Data Analysis, Monographs on statistics and applied prob- 
ability (Chapman and Hall, 1986). 

31 C. Dc Boor, A practical guide to splines (Springer- 
Verlag, New York, 1978), ISBN 0387903569 3540903569 
0387903569 3540903569. 

32 C. Bartels and M. Karplus, J. Comp. Chem. 18, 1450 
(1997). 

33 C. Bartels, Chem. Phys. Lett. 331, 446 (2000). 

34 M. Habeck, Phys. Rev. Lett. 98, 200601 (pages 4) (2007). 

35 B. Roux, Computer Physics Communications 91, 275 
(1995). 

36 I. J. Good and R. A. Gaskins, Biometrika 58, 255 (1971). 

37 B. W. Silverman, Annals of Statistics 10, 795 (1982). 

38 J. Nocedal, Mathematics of Computation 35, 773 (1980). 

39 D. C. Liu and J. Nocedal, Math. Program. 45, 503 (1989). 

40 W. L. Jorgensen, J. Chandrasekhar, J. Madura, and M. L. 
Klein, J. Chem. Phys. 79, 926 (1983). 

41 J. F. Stoddart, Stereochemistry of Carbohydrates (John 
Wiley & Sons Inc, 1971). 

42 M. J. Forster and B. Mulloy, Biopolymers 33, 575 (1993). 

43 D. Cremer and J. A. Pople, J. Am. Chem. Soc. 97, 1354 
(1975). 

44 J. P. Snyder, Flattening the earth : two thousand years 
of map projections (University of Chicago Press, Chicago, 
1997). 

45 C. R. Loader, Annals of Statistics 27, 415 (1999). 

46 S. Ernst, G. Venkataraman, V. Sasisekharan, R. Langer, 
C. L. Cooney, and R. Sasisekharan, J. Am. Chem. Soc. 
120, 2099 (1998). 

47 B. Mulloy and M. J. Forster, Glycobiology 10, 1147 
(2000). 

48 M. Karplus, J. Am. Chem. Soc. 85, 2870 (1963). 

49 C. A. G. Haasnoot, F. A. A. M. de Leeuw, and C. Altona, 
Tetrahedron 36, 2783 (1980). 

50 D. Case, T. Darden, T. Cheatham, III, C. Simmerling, 
J. Wang, R. Duke, R. Luo, M. Crowley, R. andW. Zhang, 
et al., AMBER 10 (University of California, San Francisco, 
2008). 

61 A. Onufriev, D. Bashford, and D. A. Case, J. Phys. Chem. 
B 104, 3712 (2000). 

52 A. Onufriev, D. Bashford, and D. A. Case, Proteins 55, 
383 (2004). 

53 J. Weiser, P. S. Shenkin, and W. C. Still, J. Comp. Chem. 
20, 217 (1999). 

54 T. A. Darden, D. M. York, and L. G. Pedersen, J. Chem. 
^ Phys. 98, 10089 (1993). 

55 U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, 
H. Lee, and L. G. Pedersen, J. Chem. Phys. 103, 8577 
(1995). 



